MetaBoot: a machine learning framework of taxonomical biomarker discovery for different microbial communities based on metagenomic data

نویسندگان

  • Xiaojun Wang
  • Xiaoquan Su
  • Xinping Cui
  • Kang Ning
  • Yong Wang
چکیده

As more than 90% of species in a microbial community could not be isolated and cultivated, the metagenomic methods have become one of the most important methods to analyze microbial community as a whole. With the fast accumulation of metagenomic samples and the advance of next-generation sequencing techniques, it is now possible to qualitatively and quantitatively assess all taxa (features) in a microbial community. A set of taxa with presence/absence or their different abundances could potentially be used as taxonomical biomarkers for identification of the corresponding microbial community's phenotype. Though there exist some bioinformatics methods for metagenomic biomarker discovery, current methods are not robust, accurate and fast enough at selection of non-redundant biomarkers for prediction of microbial community's phenotype. In this study, we have proposed a novel method, MetaBoot, that combines the techniques of mRMR (minimal redundancy maximal relevance) and bootstrapping, for discover of non-redundant biomarkers for microbial communities through mining of metagenomic data. MetaBoot has been tested and compared with other methods on well-designed simulated datasets considering normal and gamma distribution as well as publicly available metagenomic datasets. Results have shown that MetaBoot was robust across datasets of varied complexity and taxonomical distribution patterns and could also select discriminative biomarkers with quite high accuracy and biological consistency. Thus, MetaBoot is suitable for robustly and accurately discover taxonomical biomarkers for different microbial communities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel-META 2.0: Enhanced Metagenomic Data Analysis with Functional Annotation, High Performance Computing and Advanced Visualization

The metagenomic method directly sequences and analyses genome information from microbial communities. The main computational tasks for metagenomic analyses include taxonomical and functional structure analysis for all genomes in a microbial community (also referred to as a metagenomic sample). With the advancement of Next Generation Sequencing (NGS) techniques, the number of metagenomic samples...

متن کامل

Machine learning for metagenomics: methods and tools

Owing to the complexity and variability of metagenomic studies, modern machine learning approaches have seen increased usage to answer a variety of question encompassing the full range of metagenomic NGS data analysis. We review here the contribution of machine learning techniques for the field of metagenomics, by presenting known successful approaches in a unified framework. This review focuse...

متن کامل

MixMC: A Multivariate Statistical Framework to Gain Insight into Microbial Communities

Culture independent techniques, such as shotgun metagenomics and 16S rRNA amplicon sequencing have dramatically changed the way we can examine microbial communities. Recently, changes in microbial community structure and dynamics have been associated with a growing list of human diseases. The identification and comparison of bacteria driving those changes requires the development of sound stati...

متن کامل

Meta-Storms: efficient search for similar microbial communities based on a novel indexing scheme and similarity score for metagenomic data

BACKGROUND It has long been intriguing scientists to effectively compare different microbial communities (also referred as 'metagenomic samples' here) in a large scale: given a set of unknown samples, find similar metagenomic samples from a large repository and examine how similar these samples are. With the current metagenomic samples accumulated, it is possible to build a database of metageno...

متن کامل

A New Approach for Scalable Analysis of Microbial Communities

Motivation: Microbial communities play important roles in the function and maintenance of various biosystems, ranging from human body to the environment. Current methods for analysis of microbial communities are typically based on taxonomic phylogenetic alignment using 16S rRNA metagenomic or Whole Genome Sequencing data. In typical characterizations of microbial communities, studies deal with ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2015